Transcriptomes of aging brain, heart, muscle, and spleen from female and male African turquoise killifish

The African turquoise killifish is an emerging vertebrate model organism with great potential for aging research due to its naturally short lifespan. Thus far, turquoise killifish aging ‘omic’ studies have examined a single organ, single sex and/or evaluated samples from non-reference strains. Here, we describe a resource dataset of ribosomal RNA-depleted RNA-seq libraries generated from the brain, heart, muscle, and spleen from both sexes, as well as young and old animals, in the reference GRZ turquoise killifish strain. We provide basic quality control steps and demonstrate the utility of our dataset by performing differential gene expression and gene ontology analyses by age and sex. Importantly, we show that age has a greater impact than sex on transcriptional landscapes across probed tissues. Finally, we confirm transcription of transposable elements (TEs), which are highly abundant and increase in expression with age in brain tissue. This dataset will be a useful resource for exploring gene and TE expression as a function of both age and sex in a powerful naturally short-lived vertebrate model.

RNa isolation. For RNA isolation, frozen tissues (30-50 mg) were placed in MP biomedicals lysis matrix D tubes (CAT#6913500) filled with 1 mL of Trizol reagent (Thermo-Fisher), then homogenized using Benchmark BeadBug 6. Total RNA was purified using Direct-zol RNA Miniprep Plus Kit (Zymo cat# R2072) following the manufacturer's instructions. RNA quality was assessed using high sensitivity RNA screen tapes (Agilent cat# 5067-5579, 5067-5580) on Agilent Tapestation 4200 to obtain the RNA Integrity Number (RIN). Samples with a RIN score of <4 were discarded, which excluded 8/20 liver samples including all old male samples. Due to the high number of samples that did not pass QC, which would compromise our ability to measure some of the biological groups, we chose not to proceed with liver RNA-seq library preparation.
RNa-Seq library preparation and sequencing. We used 40 ng of total RNA, which was subjected to ribosomal-RNA depletion using the RiboGone ™ -Mammalian kit (Clontech cat# 634847) according to the manufacturer's protocol. Strand specific RNA-seq libraries were then constructed using the SMARTer Stranded RNA-seq Kit (Clontech), according to the manufacturer's protocol. Libraries were quality controlled using high sensitivity D1000 screen tapes (Agilent cat# 5067-5585, 5067-5603) on Agilent Tapestation 4200 before multiplexing the libraries for sequencing. Some samples were lost at this stage, as no library could be recovered, i.e. Fig. 1 Experimental design and analytical pipeline. The experimental design used to generate our RNA-seq dataset. Brain, heart, muscle, and spleen were dissected from sets of 5 young female, young male, old female, and old male GRZ strain killifish. RNA was extracted, depleted of ribosomal reads, and sequenced. After sequencing, reads were mapped to a turquoise killifish genome reference and counted with TETranscripts. The ratio of TE to gene reads was compared for each library, groups were contrasted for similarity using PCA analysis, differential gene expression was run using DESeq 2, and gene ontology analysis was run using clusterProfiler.
To determine the ratio of reads mapped to introns and exons, we used featureCounts version 2.0.4 41 to summarize the number of reads mapped to the exon level and gene level with the killifish reference gene annotation, respectively. The number of intronic reads was determined by subtracting the sum of exonic reads from the sum of reads mapped to gene features 42 .
Transposable element read ratio. To determine the ratio of reads contributed by TE regions, the Tetranscripts summarized count matrices were imported into R version 4.3.0 43 . The sum of reads mapped to TE features was divided by the total sum of reads in each tissue samples respectively. Non-parametric Mann-Whitney rank test was used to determine whether there was a statistically significant difference in TE ratio grouped by sex and age with ggpubr version 0.6.0 44 and false discovery rate [FDR] was reported to correct for multiple testing.
Differential gene expression analysis & transcriptional read correlation. The TETranscripts summarized count matrices were imported into R version 4.3.0 and differential gene expression analysis was conducted using DESeq 2 version 1.40.1 45 with sex and age as modeling variables. Normalized count matrices, variance-stabilized count matrices and differential gene expression result matrices were generated. Full list of differential analysis result by sex and age are provided (Supplemental Table S2). Transcriptome-wide correlation of reads mapped to gene and TE features was determined by assessing the pair-wise Spearman rank correlation between each sample pair. We also used principal component analysis on the variance-stabilized count matrices to determine the overall separation of samples across tissue types, as a function of age and sex. TE features were further classified into LINEs, SINEs, LTRs, DNA TEs, unclear, and unknown as provided in FishTEDB 38 . Unclear and unknown categories were collapsed under one single unknown category. The numbers of differentially expressed TE by age and sex within each category were reported for each of the four tissues.

Variance partition analysis.
To determine the amount of variance that could be explained by sex and age, the variance-stabilized count matrices were first split into TE and canonical gene count matrix. R package varian-cePartition version 1.30.0 46 was used to determine the amount of variance explained by sex and age in TE and canonical gene count matrices respectively.

Gene ontology analysis.
To determine the biological pathways that were significantly altered in aging and pathways that were implicated in sex dimorphism, we performed GSEA (gene set enrichment analysis) GO analysis 47 . As described in Teefy et al. 17 , turquoise killifish protein sequences from GCA_014300015.1 were aligned to the Ensembl release 104 human protein database using BLASTP 48 (NCBI BLAST version 2.13.0). The top human protein sequence for each turquoise killifish hit was retained using a minimal E-value cutoff of 10 −3 and used for GSEA. Although this E-value threshold may seem lenient, it is accepted for the comparison of species as evolutionary distant as the turquoise killifish and humans 30,49,50 . The results of the differential gene expression analysis with respect to sex and age were used as inputs for GSEA for each tissue. Killifish reference gene annotations were substituted with human homolog when possible and genes without human homologs that were able to pass the BLASTP filter were discarded. When multiple genes map to the same human homolog, the log-two-fold change were averaged. Genes were then sorted in a decreasing order with respect to the log-two-fold change. GSEA was performed using the R package clusterProfiler version 4.8.1 51 and human gene ontology database org.Hs.eg.db version 3.17.0 52 . GSEA was run using a minimum gene set of 25 terms and a maximum gene set of 5,000 terms using an FDR threshold of 5%. Full list of GSEA results by tissues with respect to age and sex are provided (Supplemental Table S3).

Data Records
Sequencing data was submitted to the Sequence Read Archive and is accessible through SRA accession SRP430823 (Transcriptional profiling of aging tissues from African turquoise killifish) 53 . Accession for individual samples is provided in Supplemental Table S1. www.nature.com/scientificdata www.nature.com/scientificdata/

technical Validation
Experimental design and quality control. We generated ribosomal RNA-depleted RNA-seq libraries from the brain, heart, muscle, and spleen from young (6-weeks-old) and old (16-weeks-old) male and female GRZ strain turquoise killifish starting from 5 fish per biological group (Fig. 1a,Supplemental Table S1). Importantly, each euthanized fish contributed all profiled tissues to minimize the number of subjects, as well as ultimately to potentially identify transcriptional signatures common to particular subjects across multiple tissues (i.e. brain, heart, muscle, and spleen samples from individual GRZ-AD_8240; Supplemental Table S1). Due to library construction failure, one young female muscle sample and one young female spleen sample were not sequenced (Supplemental Table S1).
We began to assess library quality by analyzing the number of reads in each library (Fig. 2a, Supplemental Table S4). Each library had roughly the same number of counts with no systemic bias towards any groups. Of note, the brain libraries consistently had the fewest total reads, although read counts were comparable across brain libraries. Next, we performed FastQC analysis using the MultiQC tool on each RNA-seq library to determine the mean quality scores for each sample (Fig. 2b). Quality scores for each library consistently had a Phred score > = 33 for the length of the read thereby, showing we generated high-quality RNA-seq libraries.
After confirming high-quality libraries, we mapped each RNA-seq library to a recently published killifish genome version 36 that was softmasked with turquoise killifish TE sequences from FishTEDB 38 . First, we measured the intronic/exonic read ratio using featureCounts and observed that roughly half of all reads were intronic and half were exonic. The brain and spleen had the highest ratio of intronic reads while the heart and muscle had www.nature.com/scientificdata www.nature.com/scientificdata/ the lowest (Supplemental Table S5). Importantly, overall mapped fractions (including both uniquely mapped and multi-mapped reads) were high and consistent across libraries (>90%), although some libraries had higher multi-mapping rates (Fig. 2c). Interestingly, multi-mapping reads are likely to stem from repetitive regions of the genome (including TEs), which represent a large portion of the African turquoise killifish genome 54 .
Next, to capture both gene and TE counts, we generated read counts for genes and TEs using TETranscripts, as in Teefy et al. 17 . After generating count matrices consisting of gene and TE counts, we normalized reads in DESeq 2 and created transcript expression correlation maps between libraries (Fig. 2c). We found that samples clustered tightly by tissue, consistent with strong expected tissue-specific transcript expression. To assess transcriptional similarity of various samples in each tissue, we performed principal component analysis (PCA) on each count matrices normalized with the Variance Stabilizing Transformation in DESeq 2 ( Fig. 3a-d). In all tissues, transcript expression tended to segregate mostly by age, with a lesser secondary separation by sex. To quantify how much transcript expression variation in each tissue could be explained by age and sex, we used "variancePartition" (Fig. 3e-f). Interestingly, in each tissue, age accounted for more variance in gene expression than sex for both genes (Fig. 3e) and TEs (Fig. 3f).

Differential transcription by age and sex.
To assess the quality and useability of the dataset, we next performed differential gene expression analysis using DESeq 2, starting by using only genes, and then with TEs (see below; Supplemental Table S2). We used a combined differential expression model with animal age and sex as modeling covariates. Using a significance threshold of FDR ≤ 5%, we identified substantial age-related gene plots of transcript expression highlighted by group (young female, young male, old female, old male) for (a) brain (b) heart (c) muscle (d) spleen. In each tissue besides the spleen, transcript expression segregates by age along PC1. In the spleen, samples still separate by age but primarily along PC2. (e) Variance in gene expression explained by age and sex in each tissue. In each tissue, age explains more of the variance in gene expression relative to sex. (f) Variance in TE expression explained by age and sex. In each tissue, age explains more of the variance in TE expression relative to sex.
www.nature.com/scientificdata www.nature.com/scientificdata/ expression changes with 3611, 4910, 5077, and 2195 differentially expressed genes in brain, heart, muscle, and spleen, respectively (Fig. 4a). These numbers are consistent with the number of differentially expressed genes with aging observed in previous transcriptomic studies of aging in this species with single tissues, single sex and/ or in a non-standard strain 17,31,55 . In agreement with our PCA analysis, we find fewer genes with sex-dimorphic expression in each tissue with 0, 429, 30, and 13 differentially expressed genes between females and males in the brain, heart, muscle, and spleen, respectively (Fig. 4b).
Next, we analyzed the differential expression of TEs in these tissues. We found that as a percentage of mapped reads in each library, reads mapping to TEs ranges varied strongly by tissue, with ~50% of all reads in brain libraries mapping to TEs and only <20% of reads mapping to TEs in muscle libraries (Fig. 5a). TEs were more differentially expressed by age rather than sex with 897, 706, 291, and 114 differentially expressed TEs in brain, heart, muscle, and spleen, respectively. Most tissues had an approximately equal proportion of up-and down-regulated TEs except the brain, which showed a strong bias for TE upregulation with age (Fig. 5b). Like genes, TEs had more limited sex-dimorphic expression compared to age-related expression with only 15, 14, 0, and 1 differentially expressed TEs between sexes in brain, heart, muscle, and spleen, respectively (Fig. 5c). Brains had the most amount of differentially expressed TEs by sex and by age (Fig. 5b,c). When TEs were segmented into their respective subfamilies, LINE TEs were the most upregulated TE family in both the aging brain and in female brains (Fig. 5d,e).
Lastly, we performed gene set enrichment analysis (GSEA) using gene ontology (GO) functional categories (using homology mapping from human annotations), to determine whether our dataset was amenable to this type of analysis (Supplemental Table S3). GO enrichment analysis was performed in each tissue, to determine enrichment as a function of age (Fig. 6a) and as a function of sex (Fig. 6b). As reported in previous aging 'omic' studies across animal taxa 2,56 , at least one immune-related term was enriched in aged tissues compared to young tissues (Fig. 6a), consistent with the notion of "inflamm-aging". Importantly, young muscle also showed an enrichment of cell-cycle gene transcription, which may reflect more active or abundant muscle stem cells. All tissues displayed enough transcriptional sex-dimorphism to have at least 5 significantly enriched GO terms per sex, except for the female spleen, which only showed increased interferon production relative to the male spleen (Fig. 6b).

Usage Notes
This dataset can be used to find differences in gene and TE expression using age and sex as variables in any combination suitable to the user. In addition, to facilitate the exploration of this dataset, we have deployed a user-friendly searchable database of differential gene and TE expression results, with human homology information, that can be mined by the community (https://alanxu-usc.shinyapps.io/nf_interactive_db/). The dataset could also be deconvoluted using single-cell atlases to establish cell composition profiles and analyze how cell type frequencies change with age and sex in each tissue. a Strip plot of differential gene expression by age b Strip plot of differential gene expression by sex with the number of significantly differential genes in parentheses. In each tissue, thousands of genes are differentially expressed by age. The muscle was the tissue most affected by age at the transcriptional level with 5,077 differentially expressed genes. Brown denotes genes upregulated in old tissues, yellow denotes genes upregulated in young tissues, gray denotes non-significant differences in gene expression between ages. (b) Strip plot showing the number of differentially expressed genes by sex (FDR ≤ 5%) with the number of significantly differential genes in parentheses. Fewer genes are differentially expressed by sex than age in all tissues assayed. The heart had the most sex-dimorphic gene expression with 429 genes differentially expressed by sex. Pink denotes genes upregulated in female tissues, blue denotes genes upregulated in male tissues, gray signifies genes with no significant expression differences between sexes.
www.nature.com/scientificdata www.nature.com/scientificdata/   www.nature.com/scientificdata www.nature.com/scientificdata/ Limitations of this dataset are that, like most aging -omic studies outside of consortia efforts, it uses only 2 timepoints, which limits ability to enable detection of specific changes at middle-age 17 . Future transcriptomic studies of female vs. male turquoise killifish focusing on specific tissues may benefit from increased time resolution. The study also only looks into limited somatic tissues, namely brain, heart, muscle and spleen. Future studies including additional somatic tissues will be useful to expand our knowledge of sex-differences in aging turquoise killifish tissues. In addition, TE quantification may be partially driven by TE-derived intronic reads that are retained by ribosomal RNA-depleted RNA-seq library preparation 60 . In effect, this dataset cannot distinguish between intronic-derived TEs and autonomous TEs, which are regulated in a different fashion, although both may contribute to biological changes. Nonetheless, this dataset is useful in determining the total amount and class of TE reads present in young and old tissues across sexes.

code availability
All analytical code used for processing and technical validation is available on the Benayoun Laboratory GitHub repository (https://github.com/BenayounLaboratory/Killifish_RNASeq_2023). The provided R code was run and tested on R v4.3.0. acknowledgements Some figure elements were generated with BioRender (https://biorender.com) and Freepik. We would like to acknowledge the Center for Advanced Research Computing (CARC) at USC for providing the computational resources used to perform many of the analyses used in this study. This work was supported by a National Institute on Aging (NIA) T32 AG052374 postdoctoral training grant fellowship to B.B.T. and a predoctoral fellowship to R.J.L., grant R35 GM142395 from National Institute of General Medical Sciences, a pilot grant from the NAVIGAGE Foundation, and a Hanson-Thorell Family award to B.A.B.

completing interests
The authors declare no competing interests.